AITopics | sample dataset

Collaborating Authors

sample dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

04b98fd38bd42810d0764cb6c46d10d8-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 07:19:13 GMT

data vendor, dataset, vendor, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Singapore (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Data Distribution Valuation

Neural Information Processing SystemsOct-9-2025, 17:26:41 GMT

Data valuation is a class of techniques for quantitatively assessing the value of data for applications like pricing in data marketplaces.

data vendor, dataset, vendor, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Singapore (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Data Distribution Valuation

Xu, Xinyi, Wang, Shuaiqi, Foo, Chuan-Sheng, Low, Bryan Kian Hsiang, Fanti, Giulia

arXiv.org Artificial IntelligenceOct-6-2024

Data valuation is a class of techniques for quantitatively assessing the value of data for applications like pricing in data marketplaces. Existing data valuation methods define a value for a discrete dataset. However, in many use cases, users are interested in not only the value of the dataset, but that of the distribution from which the dataset was sampled. For example, consider a buyer trying to evaluate whether to purchase data from different vendors. The buyer may observe (and compare) only a small preview sample from each vendor, to decide which vendor's data distribution is most useful to the buyer and purchase. The core question is how should we compare the values of data distributions from their samples? Under a Huber characterization of the data heterogeneity across vendors, we propose a maximum mean discrepancy (MMD)-based valuation method which enables theoretically principled and actionable policies for comparing data distributions from samples. We empirically demonstrate that our method is sample-efficient and effective in identifying valuable data distributions against several existing baselines, on multiple real-world datasets (e.g., network intrusion detection, credit card fraud detection) and downstream applications (classification, regression).

data vendor, dataset, vendor, (16 more...)

arXiv.org Artificial Intelligence

2410.04386

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law Enforcement & Public Safety (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Efficient and Accurate Explanation Estimation with Distribution Compression

Baniecki, Hubert, Casalicchio, Giuseppe, Bischl, Bernd, Biecek, Przemyslaw

arXiv.org Machine LearningJun-26-2024

Exact computation of various machine learning explanations requires numerous model evaluations and in extreme cases becomes impractical. The computational cost of approximation increases with an ever-increasing size of data and model parameters. Many heuristics have been proposed to approximate post-hoc explanations efficiently. This paper shows that the standard i.i.d. sampling used in a broad spectrum of algorithms for explanation estimation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm for more efficient and accurate explanation estimation. CTE uses distribution compression through kernel thinning to obtain a data sample that best approximates the marginal distribution. We show that CTE improves the estimation of removal-based local and global explanations with negligible computational overhead. It often achieves an on-par explanation approximation error using 2-3x less samples, i.e. requiring 2-3x less model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.

dataset, efficient and accurate explanation estimation, explanation, (7 more...)

arXiv.org Machine Learning

2406.18334

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California (0.04)
Europe > Poland > Masovia Province > Warsaw (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Utilizing Large Language Models to Identify Reddit Users Considering Vaping Cessation for Digital Interventions

Vuruma, Sai Krishna Revanth, Wu, Dezhi, Gupta, Saborny Sen, Aust, Lucas, Lookingbill, Valerie, Henry, Caleb, Ren, Yang, Kasson, Erin, Chen, Li-Shiun, Cavazos-Rehg, Patricia, Hu, Dian, Huang, Ming

arXiv.org Artificial IntelligenceApr-25-2024

The widespread adoption of social media platforms globally not only enhances users' connectivity and communication but also emerges as a vital channel for the dissemination of health-related information, thereby establishing social media data as an invaluable organic data resource for public health research. The surge in popularity of vaping or e-cigarette use in the United States and other countries has caused an outbreak of e-cigarette and vaping use-associated lung injury (EVALI), leading to hospitalizations and fatalities in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cession. In this study, we extracted a sample dataset from one vaping sub-community on Reddit to analyze users' quit vaping intentions. Leveraging large language models including both the latest GPT-4 and traditional BERT-based language models for sentence-level quit-vaping intention prediction tasks, this study compares the outcomes of these models against human annotations. Notably, when compared to human evaluators, GPT-4 model demonstrates superior consistency in adhering to annotation guidelines and processes, showcasing advanced capabilities to detect nuanced user quit-vaping intentions that human evaluators might overlook. These preliminary findings emphasize the potential of GPT-4 in enhancing the accuracy and reliability of social media data analysis, especially in identifying subtle users' intentions that may elude human detection.

annotation, dataset, human evaluator, (12 more...)

arXiv.org Artificial Intelligence

2404.17607

Country:

North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Public Health (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Direct Zernike Coefficient Prediction from Point Spread Functions and Extended Images using Deep Learning

Kok, Yong En, Bentley, Alexander, Parkes, Andrew, Wright, Amanda J., Somekh, Michael G., Pound, Michael

arXiv.org Artificial IntelligenceApr-24-2024

Optical imaging quality can be severely degraded by system and sample induced aberrations. Existing adaptive optics systems typically rely on iterative search algorithm to correct for aberrations and improve images. This study demonstrates the application of convolutional neural networks to characterise the optical aberration by directly predicting the Zernike coefficients from two to three phase-diverse optical images. We evaluated our network on 600,000 simulated Point Spread Function (PSF) datasets randomly generated within the range of -1 to 1 radians using the first 25 Zernike coefficients. The results show that using only three phase-diverse images captured above, below and at the focal plane with an amplitude of 1 achieves a low RMSE of 0.10 radians on the simulated Point Spread Function (PSF) dataset. Furthermore, this approach directly predicts Zernike modes simulated extended 2D samples, while maintaining a comparable RMSE of 0.15 radians. We demonstrate that this approach is effective using only a single prediction step, or can be iterated a small number of times. This simple and straightforward technique provides rapid and accurate method for predicting the aberration correction using three or less phase-diverse images, paving the way for evaluation on real-world dataset.

aberration, dataset, zernike coefficient, (10 more...)

arXiv.org Artificial Intelligence

2404.15231

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Poland (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Factoring Hate Speech: A New Annotation Framework to Study Hate Speech in Social Media

Ron, Gal, Levi, Effi, Oshri, Odelia, Shenhav, Shaul R.

arXiv.org Artificial IntelligenceNov-7-2023

Although this annotation Social media has come to constitute a space for scheme was designed to capture and characterize the propagation of hostility (see ElSherief et al., hate speech directed towards Jews, with the exception 2018, p. 1) and provides fertile grounds for the of one group-specific aspect, it is general radicalization of individuals in support of violent enough to be applied to any other group-directed extremist groups (Reynolds and Tuck, 2016; Mitts, hate speech.

expression, speech, tweet, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.woah-1.21

2311.03969

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
North America > Canada > Ontario > Toronto (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.50)

Industry: Law Enforcement & Public Safety > Terrorism (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Understanding Unsupervised Machine Learning

#artificialintelligenceFeb-18-2023, 05:25:18 GMT

In supervised machine learning, we have a labeled dataset that is used to train the model. For example, we train a model to predict the prices of houses based on features like area, number of bedrooms, and location, etc. In unsupervised machine learning, we do not have a labeled dataset. The goal of unsupervised machine learning is to find patterns and relationships in data. Clustering is one of the most popular techniques used in unsupervised machine learning.

algorithm, dataset, k-means, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Interactive Pipeline and Composite Estimators for Your End-to-End ML Model - Open Data Science - Your News Source for AI, Machine Learning & more

#artificialintelligenceFeb-15-2023, 14:00:44 GMT

A data science model development pipeline involves various components including data injection, data preprocessing, feature engineering, feature scaling, and modeling. A data scientist needs to write the learning and inference code for all the components. The code structure sometimes becomes messier and difficult to interpret for other team members, for machine learning projects with heterogeneous data. A pipeline is a very handy function that can sequentially ensemble all your model development components. Using a pipeline one can easily perform the learning and inference tasks in a comparatively cleaner code structure.

dataset, estimator, pipeline, (11 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Interactive Pipeline and Composite Estimators for Your End-to-End ML Model

#artificialintelligenceDec-21-2022, 03:26:12 GMT

Interactive Pipeline and Composite Estimators for Your End-to-End ML Model Machine Learning Modeling posted by ODSC Community November 3, 2022 ODSC Community A data science model development pipeline involves various components including data injection, data preprocessing, feature engineering, feature scaling, and modeling. A data science model development pipeline involves various components including data injection, data preprocessing, feature engineering, feature scaling, and modeling. A data scientist needs to write the learning and inference code for all the components. The code structure sometimes becomes messier and difficult to interpret for other team members, for machine learning projects with heterogeneous data. A pipeline is a very handy function that can sequentially ensemble all your model development components.

interactive pipeline and composite estimator, pipeline, sklearn, (10 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Add feedback